Using Inverse Lexical Rules to Acquire a Wide-coverage Lexicalized Grammar
نویسندگان
چکیده
Automatic grammar extraction from annotated corpora (Xia, 1999; Chen and Vijay-Shanker, 2000; Chiang, 2000; Hockenmaier and Steedman, 2002; Miyao et al., 2004) enabled us to build a widecoverage lexicalized grammar at low cost. They succeeded in extracting a large number of lexical entries with less effort while conventional methods only allow limited lexical entries to be acquired in realworld texts. Lexicalized grammars require many lexical entries to explain various syntactic alternations, and we can hardly expect that all words will appear in all possible syntactic alternations within a limited training corpus. We aimed at improving the coverage of an automatically extracted grammar using lexical rules in this work (Jackendoff, 1975; Pollard and Sag, 1994). The idea behind lexical rules is that the syntactic constraints of a group of words are derived with general rules from their lexemes, which express characteristics common to the group (e.g. “runs” or “running” is derived from the lexeme “run”). We automatically acquired lexemes by applying lexical rules inversely to the lexical entries of the HPSG grammar extracted automatically from the Penn Treebank (Marcus et al., 1993). We could then generate a wide set of lexical entries from the lexemes, and our grammar achieved a higher coverage against real-world texts. Although the lexical rules proposed by Pollard and Sag (1994) treated several parts-ofspeech such as nouns or adjectives, we only formulated rules for verbs. This is because verbs PHON “scold” HEAD verb CAT MODL null MODR null SUBJ < HEAD noun > COMPS < HEAD noun > NONLOCAL SLASH <>
منابع مشابه
Deriving Information Structure from Prosodically Marked Text with Lexicalized Tree Adjoining Grammars
This paper proposes a method for integrating intonation and information structure into the Lexicalized Tree Adjoining Grammar (LTAG) formalism. The method works fully within LTAG and requires no changes or additions to the basic formalism. From the existing CCG analysis, we denote boundary tones as lexical items and pitch accents as features of lexical items. We then show how prosodically marke...
متن کاملEncoding Lexicalized Tree Adjoining Grammars with a Nonmonotonic Inheritance Hierachy
This paper shows how DATR, a widely used formal language for lexical knowledge representation , can be used to define an I_TAG lexicon as an inheritance hierarchy with internal lexical rules. A bottom-up featu-ral encoding is used for LTAG trees and this allows lexical rules to be implemented as covariation constraints within feature structures. Such an approach eliminates the considerable redu...
متن کاملAlpino: Wide-coverage Computational Analysis of Dutch
Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing a reasonably abstract and theory-neutral level of linguistic representation. An important aspect ...
متن کاملAutomating the Generation of a Wide-coverage LFG for French using a MetaGrammar
In this paper, we explain how the notion of MetaGrammar, which has successfully been used for generating wide-coverage tree adjoining grammars (TAGs) for various languages such as French (Abeillé et al. (1999)) and German (Gerdes (2002)), may be used to generate a wide-coverage Lexical Functional Grammar (LFG) for French. We first introduce the notion of MetaGrammar and present the tools we use...
متن کاملAmbiguity Resolution for Machine Translation of Telegraphic Messages
Telegraphic messages with numerous instances of omission pose a new challenge to parsing in that a sentence with omission causes a higher degree of ambi6uity than a sentence without omission. Misparsing reduced by omissions has a far-reaching consequence in machine translation. Namely, a misparse of the input often leads to a translation into the target language which has incoherent meaning in ...
متن کامل